Euclidean Bus Mobility and Route Optimization, A Comparison

Routes in Queens, New York City, NY

Author

Alan Vlakancic

Published

November 30, 2025

Introduction

This project uses stplanr transport modeling package to design an optimal transport route for bus or cycle routes in New York City. Stplanr is a transport planning visualization R package that can be used to plan transit networks, in addition to transit planning elements such as identifying transit catchment areas, origin/destination data and ride frequency visualizations, among others. In their white paper, the devlopers of stplanr call for an accountable, transparent and democratized transit planning system that doesn’t rely on proprietary and often vastly different data sources and data processing softwares. Although the package can visualize a whole host of data, this project will focus on comparing direct desire lines or “Euclidean” routes (as the crow flies), existing bus networks and stplanr’s optimized routes. To wit: this can map the efficiency of the bus routes compared to the most direct route possible if there were no built environment factors in the way.

Methods

To adequately compare desire lines, bus routes, and the most efficient routes with the current street network, this project will require at minimum three data sources. Each of these will be sourced separately and overlaid onto each other:

  • Data Source: leaflet data for basemap
    • Methods: This can be sourced directly into R by installing the leaflet package. It provides an interactive background map that can be used to overlay other spatial data, and the options can be toggled on and off for ease of use.
    • Source: Leaflet for R
  • Data Source: OpenStreetMap (OSMR) data for transit data, either bus or cycle routes, this is used as the route vector data.
    • Methods: This can be sourced directly into R by installing the package. You may need to rationalize different projection systems to make sure they overlay correctly.
    • Source: Mapping with OpenStreetMap in R
  • Data Source: NYC Open Data for Bus Shelter locations
    Despite significant searching, there is no comprehensive bus stop dataset, so the project will focus on bus stop shelters, which are mapped via NYC Open Data. I used Bus Shelters as there would be thousands upon thousands of bus stops in NYC, and this would be too computationally intensive to process.
    • Methods: This can be brought into R as a CSV file. Each bus stop shelter has longitude and latitude coordinates that align with leaflet and OpenStreetMap projections.
    • Source: NYC Bus Stop Shelters, NYC Transit analysis
  • Data Source: NYC Open Data for NYC Borough Boundaries

Data Preparation

  • Load the necessary R packages for spatial data manipulation and visualization (e.g., ggmap, dplyr, stplanr, osmdata, sf, leaflet).
  • Import the NYC basemap shapefile and bus shelter CSV data into R.
  • Convert the bus shelter data into an sf object with appropriate coordinate reference system (CRS).

Terms:

  • Desire Lines & “Euclidean”: Straight lines connecting origin and destination points, representing the most direct path between them.
  • OSRM: Open Street Routing Machine, a routing engine that uses Open Street Map data to calculate routes, shortest routes, travel times, and can be used to make travel time maps, distance routing for car, bike and walking.

Interactive Map:

Show code
#install libraries
library(ggmap) #vestigial
library(dplyr) #data manipulation
library(stplanr) #od2line calculation, origin-destination programming
library(osmdata) #open street maps data
library(sf) #spatial data
library(tidyverse) #data manipulation
library(leaflet) #interactive map
library(purrr) #for map2 function
library(kableExtra) #for attractive table formatting
library(scales) #for comma function
Show code
#SHAPEFILES AND MAP

bbox <- c(left = -73.96, bottom = 40.54, right = -73.70, top = 40.81)
#create bounding box for NYC

nyc_map <- "data/"#

#nyc basemap, downloaded from nyc open data. source: https://search.r-project.org/CRAN/refmans/ptools/html/nyc_bor.html

nyc_sf <- st_read(nyc_map, quiet =TRUE)
#bring in nyc_map as a sf

shelters_sf <- read_csv("data/Bus_Stop_Shelter_20251020.csv")
#NOTE: REPLACE WITH DATA WHEN USING QUARTO!
#this brings in the bus stop shelter information. source: https://data.cityofnewyork.us/Transportation/Bus-Stop-Shelters/qafz-7myz

shelters_sf_fix <- st_as_sf(shelters_sf, coords = c("Longitude","Latitude"), crs = 4326)
#convert to sf object with the correct coordinate reference system

osmdata::set_overpass_url("https://overpass-api.de/api/interpreter")
#set overpass url for open street maps, finds specific data package, the below query will use this

osm_data <- opq(bbox = bbox) %>%
  add_osm_feature(key = "highway", value = c("primary","secondary")) %>%
  osmdata_sf()
#import data for primary and secondary highways from open street maps
#opq is overpass query, which is basically where you want to look
Show code
# STPLANR FUNCTIONS

shelters_sf_fix <- shelters_sf_fix %>%
  mutate(id = paste0("S", row_number()))
#add ID column for origin-destination pairs so they have a corresponding number

flow_all <- expand.grid(o = shelters_sf_fix$id, #create origin
                        d = shelters_sf_fix$id, #create destination
                        stringsAsFactors = FALSE) %>% #make sure they aren't factors
  filter(o != d) %>% # this remove self-pairs so O is not D
  mutate(trips = 1) %>% #add trip count of 1 for each pair
  sample_n(50) #sample 50 random paris to avoid blowing up the computer

desire_lines_all <- od2line(flow_all, zones = shelters_sf_fix, zone_code = "id") #use od2line function to create desire lines (euclidean) for all pairs

shelter_coords <- shelters_sf_fix %>%
  st_coordinates() %>%
  as.data.frame() %>%
  bind_cols(id = shelters_sf_fix$id)
#extract coordinates and bind with ID column

route_single <- function(o_id, d_id) { #function to create a single route between origin and destination
  o <- shelter_coords %>% filter(id == o_id) 
  #filter to get origin coordinates
  d <- shelter_coords %>% filter(id == d_id)
  #filter to get destination coordinates

  r <- try(route_osrm(from = c(o$X, o$Y),
                      to   = c(d$X, d$Y)), silent = TRUE)
#use try to catch errors  (e.g., no route found)
  if (inherits(r, "try-error")) return(NULL)
#if route found, return the route
  return(r)
}

routes_list <- purrr::map2(flow_all$o, flow_all$d, route_single)
#create routes for all origin-destination pairs using the route_single function
routes_list <- routes_list[!sapply(routes_list, is.null)]
#remove any NULL results (failed routes)
routes_sf <- do.call(rbind, routes_list)
#combine all routes into a single sf object
Show code
#ROUTE & DESIRE LENGTH CALCULATION, MEAN & PCT CHANGE

routes_projection <- st_transform(routes_sf, 32618)
desire_projection <- st_transform(desire_lines_all, 32618)
#ensures the correct, projected shapefile for computation not mapping

route_length <- st_length(routes_projection)
desire_length <- st_length(desire_projection)
#compute lengths

route_length <- as.numeric(route_length)
desire_length <- as.numeric(desire_length)
#convert lengths to numeric values

lengths_tbl <- tibble(
  route_m  = route_length,
  desire_m = desire_length,
  origin = flow_all$o,
  destination = flow_all$d
)
#create tibble to compare lengths in the final map w/ IDs

lengths_tbl_print <- tibble(
  route_m  = comma(round(route_length)),
  desire_m = comma(round(desire_length)),
  origin = flow_all$o,
  destination = flow_all$d
)
#tidy data for later printing in a kable

mean_route <- mean(route_length, na.rm = TRUE)
mean_desire <- mean(desire_length, na.rm = TRUE)
#calculate means for both route and desire lengths

percent_change <- ((mean_route - mean_desire) / mean_desire) * 100
#calculate percent change 

mean_lengths <- data.frame(
  type = c("Route Length", "Desire Line Length"),
  mean_length_m = c(round(mean_route), round(mean_desire)))

#put these into a data frame, rounded to whole numbers


mean_lengths <- mean_lengths %>%
  mutate(
    mean_length_km = mean_length_m / 1000,
    percent_change = c(percent_change, NA)
  )
#add km conversion and percent change to the data frame, i converted to KM for ease of computation (ie: dividing by 1,000)

mean_lengths_print <- mean_lengths %>%
  mutate(
    mean_length_km = comma(round(mean_length_m / 1000)),
    percent_change = comma(round(percent_change))
  )
#tidy data for later printing in a kable
Show code
#LEAFLET PREP

nyc_leaflet  <- st_transform(nyc_sf, 4326)
roads_leaflet <- st_transform(osm_data$osm_lines, 4326)
desire_leaflet <- st_transform(desire_lines_all, 4326)
routes_leaflet <- st_transform(routes_sf, 4326)
#transform all data to WGS84 for leaflet mapping

desire_leaflet_popup <- paste0(
  "<b>Desire Line</b><br/>",
  "Origin: ", desire_leaflet$o, "<br/>",
  "Destination: ", desire_leaflet$d, "<br/>",
  "Desire Line Distance: ", round(lengths_tbl$desire_m / 1000), " km"
)
#create popup info for desire lines for interactive map
  
routes_leaflet_popup <- paste0(
  "<b>OSRM Route</b><br/>",
  "Origin: ", desire_leaflet$o, "<br/>",
  "Destination: ", desire_leaflet$d, "<br/>",
  "Route Distance: ", round(lengths_tbl$route_m / 1000), " km<br/>"
)
#create popup info for routes for interactive map

pal_desire <- colorNumeric(
  palette = "viridis",
  domain  = lengths_tbl$desire_m
)
#create color palette for desire lines based on distance

pal_routes <- colorNumeric(
  palette = "inferno",
  domain  = lengths_tbl$route_m
)
#create color palette for routes based on distance

selected_ids <- unique(c(flow_all$o, flow_all$d))
#get unique IDs of sampled shelters

selected_shelters <- shelters_sf_fix %>% 
  filter(id %in% selected_ids)
#filter shelters to only those that were sampled
Show code
#LEAFLET MAP

leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  
  addPolygons(data = nyc_leaflet,
              color = "black", weight = 3,
              fillOpacity = 0.1,
              group = "NYC Boundary") %>%
  # Add OSM roads w/ positron background
  
  addPolylines(
    data = desire_leaflet,
    color = pal_desire(lengths_tbl$desire_m),
    weight = 3,
    opacity = 0.7,
    popup = desire_leaflet_popup,
    group = "Desire Lines"
  ) %>%
  
  addPolylines(
    data = routes_leaflet,
    color = pal_routes(lengths_tbl$route_m),
    weight = 3,
    opacity = 0.8,
    popup = routes_leaflet_popup,
    group = "OSRM Routes"
  ) %>%
  
  addLegend(
    pal = pal_desire,
    values = lengths_tbl$desire_m,
    title = "Desire Line Distance (m)",
    position = "bottomright",
    group = "Desire Lines"
  ) %>%
  #add a legend for desire lines
  addLegend(
    pal = pal_routes,
    values = lengths_tbl$route_m,
    title = "OSRM Route Distance (m)",
    position = "bottomleft",
    group = "OSRM Routes"
  ) %>%
  #add a legend for osrm routes

    addCircleMarkers(data = shelters_sf_fix,
                   color = "blue",
                   radius = 1,
                   popup = ~id,
                   group = "Bus Shelters") %>%
  #add all bus shelters
  addCircleMarkers(
    data = selected_shelters,
    color = "purple",
    radius = 6,
    fillOpacity = 0.9,
    group = "Sampled Shelters"
  ) %>%
  #add sampled bus shelters with purple markers
  addLayersControl(
    overlayGroups = c("NYC Boundary", "Sampled Shelters",
                      "Desire Lines", "OSRM Routes",
                      "Bus Shelters"),
    options = layersControlOptions(collapsed = FALSE)
  )

Tables:

Show code
mean_lengths_print %>%
  kable(col.names = c("Type", "Mean Length (m)", "Mean Length (km)", "Percent Change (%)"),
        caption = "Mean Lengths of OSRM Routes vs Desire Lines") %>%
  kable_styling(full_width = FALSE, position = "left")
Mean Lengths of OSRM Routes vs Desire Lines
Type Mean Length (m) Mean Length (km) Percent Change (%)
Route Length 18495 18 24
Desire Line Length 14883 15 NA
Show code
lengths_tbl_print %>%
  kable(col.names = c("Route Length (m)", "Desire Line Length (m)", "Origin ID", "Destination ID"),
        caption = "Comparison of Route Lengths and Desire Line Lengths for Sampled Origin-Destination Pairs") %>%
  kable_styling(full_width = FALSE, position = "left")
Comparison of Route Lengths and Desire Line Lengths for Sampled Origin-Destination Pairs
Route Length (m) Desire Line Length (m) Origin ID Destination ID
3,222 2,571 S469 S451
16,870 11,093 S1708 S2522
15,190 13,561 S879 S297
28,745 25,158 S900 S1355
19,374 17,485 S2232 S2668
11,932 9,488 S2182 S1029
18,712 15,999 S1104 S1707
35,698 27,245 S992 S292
8,075 6,235 S2865 S1317
32,959 21,873 S1078 S2666
29,226 24,524 S2913 S2261
19,698 17,183 S2896 S667
25,982 22,626 S834 S964
8,647 6,845 S50 S2576
6,013 5,287 S426 S1843
12,327 10,880 S1966 S2874
22,964 10,001 S2820 S1038
19,683 14,326 S2501 S2342
16,348 14,004 S2386 S2563
23,423 20,394 S171 S2416
24,451 15,595 S3152 S1080
33,602 9,761 S418 S3337
8,318 7,461 S3162 S2919
21,735 16,438 S2651 S1088
14,096 12,445 S319 S2327
7,880 7,054 S1899 S2937
39,562 37,136 S3361 S1460
20,993 17,919 S262 S1283
20,604 12,059 S2895 S954
29,836 26,339 S1279 S24
2,546 1,993 S722 S78
12,822 10,614 S2505 S2594
4,998 4,810 S1571 S1675
8,678 7,484 S2580 S2957
8,738 7,338 S1314 S1146
8,537 7,928 S208 S894
24,940 22,271 S3167 S2143
27,794 24,405 S3299 S3043
19,430 15,656 S169 S1216
15,104 12,305 S2356 S633
18,870 13,620 S2603 S2176
11,021 7,846 S1267 S2399
36,977 34,530 S3259 S1531
22,052 19,959 S1816 S2789
17,855 16,619 S893 S1289
22,629 20,027 S889 S1838
34,323 28,387 S2532 S3241
16,506 15,383 S2929 S2137
5,437 5,208 S1922 S1466
9,319 8,786 S2431 S2469

Results

The mean route length for the optimized routes for this particular sample run is 18.49 km, while the mean Euclidean desire line length is 14.88 km. This represents a percent change of 24.27% longer for the optimized routes compared to the direct desire lines. The interactive map above visualizes these routes, with desire lines colored based on their lengths and Open Street Routing Machine (OSRM) routes similarly colored with a different theme.

Discussion

The results indicate that the optimized OSRM routes are significantly longer than the direct desire lines, which is generally expected given the constraints of the built environment and road network. The percent change of 24.27% suggests that while the desire lines represent the most direct path between two points, real-world travel must navigate around obstacles, follow roadways, and adhere to traffic regulations etc.

Limitations

  • This does not represent all bus stops in NYC, just shelters. Although the exact number of bus stops is difficult to find, the MTA states that there are 327 bus routes in the five boroughs and countless stops in between. To make the data manageable both in computation and visualization, this study only selects 50 at random. This limits the amount of data points and does not fully capture the bus network.
  • The OSRM routing service may not always find a route between two points, especially if they are very close together or in areas with limited road connectivity. The code removes these unroutable routes, and they are not shown in the data.
  • The analysis does not account for real-world factors such as traffic, hazards, closures, road conditions, bus “bunching” or transit schedules, which can significantly impact actual travel times and route efficiency.The analysis assumes that the shortest path is the most efficient, which may not always be the case in real-world scenarios.
  • The sample size of 50 origin-destination pairs is relatively small and is not be representative of the entire bus network in NYC.
  • Only “Primary” and “Secondary” roads are sampled here, as the computation for the smaller roads (tertiary etc.) was too processing heavy. This eliminates a large selection of routes.

Future:

Future research could expand the sample size to include more origin-destination pairs, or even all bus stops. Incorporating real-world travel time data, traffic patterns, and transit schedules could provide a more comprehensive understanding of route efficiency. Further analysis could also explore the impact of different modes of transportation, such as cycling or walking, on route optimization and efficiency.

References

  • A. J. Smit. Mapping with Google in R. Source.
  • CRAN. stplanr Paper and Vignettes. Source.
  • NYC Open Data. Bus Stop Shelters Dataset. Source.
  • Robin Lovelace. stplanr: R Package Documentation. Source.
  • openstreetmap.de. OpenStreetMap Routing Service. Source.
  • RStudio. Leaflet for R. Source.
  • MTA. New York City Transit Subway and Bus Facts 2019. Source.
  • NYC.gov. NYC Borough Boundaries Dataset. Source.